227 research outputs found

    Toward Word Embedding for Personalized Information Retrieval

    Full text link
    This paper presents preliminary works on using Word Embedding (word2vec) for query expansion in the context of Personalized Information Retrieval. Traditionally, word embeddings are learned on a general corpus, like Wikipedia. In this work we try to personalize the word embeddings learning, by achieving the learning on the user's profile. The word embeddings are then in the same context than the user interests. Our proposal is evaluated on the CLEF Social Book Search 2016 collection. The results obtained show that some efforts should be made in the way to apply Word Embedding in the context of Personalized Information Retrieval

    Correspondances compatibles avec les fichiers inverses pour la recherche d'information.

    No full text
    National audienceCet article fait un retour sur l'un des éléments majeurs d'un système de recherche d'information : la correspondance basée sur des fichiers inverses car le passage d'une formule théorique à une implantation compatible avec des fichiers inverse est rarement explicitée dans les publications. Nous proposons ici de définir plus formellement l'expression d'une formule de correspondance compatible avec des fichiers inverses. Nous proposons deux niveaux de com- patibilité. Nous étudions les modèles les plus classiques en vérifiant leur compatibilité avec les fichiers inverses. Nous explorons la traduction d'une correspondance de Jensen-Shannon, initialement non-compatible avec les fichiers inverses, vers deux formules compatibles avec les fichiers inverses à chaque niveau. Une expérimentation simple, sur un corpus d'images, montre que la classique Divergence de Kullback-Leibler obtient des résultats moins bons que la Diver- gence de Jensen-Shannon compatible avec des fichiers inverses

    Contrainte de correspondance Document-Document pour la RI. Application à la Divergence de Kullback-Leibler

    No full text
    National audienceThis paper defines a new axiomatic constraint, namely DDMC (Document-DocumentMatching Constraint), for information retrieval that depicts the behavior of a matching if a corpusdocument is used as a query. The DDMC constraint is not verified by a classical IR modellike the Language Model based on Jelinek-Mercer smoothing and Kulback-Leibler Divergence.We introduce a modification of this model to validate DDMC. An experiment conducted on twocorpus show that the modification of the reference model does not degrade significantly theresults, and validates the DDMC.Cet article décrit une contrainte d'un modèle de recherche d'information décrivant les comportement attendu d'un système si un document du corpus est posé en requête, la contrainte DDMC (Document-Document Matching Constraint). Cette contrainte n'étant pas vérifiée par un modèle classique de recherche d'information (modèle de langue basé sur un calcul de néga-tive de Divergence de Kullback-Leibler avec lissage de Jelinek-Mercer), nous présentons une modification de ce dernier modèle qui permet de vérifier DDMC. Une dernière partie présente des expérimentations menées afin de vérifier que notre modification n'impacte pas la qualité des réponses d'un système, tout en garantissant la vérification de DDMC

    Temporal re-scoring vs. temporal descriptors for semantic indexing of videos

    No full text
    International audienceThe automated indexing of image and video is a difficult problem because of the "distance" between the arrays of numbers encoding these documents and the concepts (e.g. people, places, events or objects) with which we wish to annotate them. Methods exist for this but their results are far from satisfactory in terms of generality and accuracy. Existing methods typically use a single set of such examples and consider it as uniform. This is not optimal because the same concept may appear in various contexts and its appearance may be very different depending upon these contexts. The context has been widely used in the state of the art to treat various problems. However, the temporal context seems to be the most crucial and the most effective for the case of videos. In this paper, we present a comparative study between two methods exploiting the temporal context for semantic video indexing. The proposed approaches use temporal information that is derived from two different sources: low-level content and semantic information. Our experiments on TRECVID'12 collection showed interesting results that confirm the usefulness of the temporal context and demonstrate which of the two approaches is more effective

    The Outline of an 'Intelligent' Image Retrieval Engine

    No full text
    International audienceThe first image retrieval systems hold the advantage of being fully automatic, and thus scalable to large collections of images but are restricted to the representation of low-level aspects (e.g. colors, textures...) without considering the semantic content of images. This obviously compromises interaction, making it difficult for a user to query with precision. The growing need for 'intelligent' systems, i.e. being capable of bridging this semantic gap, leads to new architectures combining multiple characterizations of the image content. This paper presents SIR1, a promising high-level framework featuring semantics, signal color and spatial characterizations. It features a fully-textual query module based on a language manipulating both boolean and quantification operators, therefore making it possible for a user to request elaborate image scenes such as a "covered(mostly grey) sky" or "people in front of a building"

    A Model for Weighting Image Objects in Home Photographs

    No full text
    International audienceThe paper presents a contribution to image indexing consisting in a weighting model for visible objects - or image objects - in home photographs. To improve its effectiveness this weighting model has been designed according to human perception criteria about what is estimated as important in photographs. Four basic hypotheses related to human perception are presented, and their validity is estimated as compared to actual observations from a user study. Finally a formal definition of this weighting model is presented and its consistence with the user study is evaluated

    LIG at CLEF 2015 SBS Lab

    No full text
    International audienceThis paper describes the work achieved by the MRIM research group of Grenoble, using some data from the LaHC of Saint-´ Etienne, in a way to test personalized retrieval of books for the Social Book Search Lab of CLEF 2015. Our proposal rely on a biased fusion of content-only retrieval, using BM25F and LGD retrieval models, user non-social profile based on the catalog of the requester, and social profiles using user/user links generated from their catalogs and ratings on books. The official results obtained show a clear positive impact of user profile, and a small positive impact of the social elements we used. Post official results that present non biased fusion scores are also presented

    Integrating Perceptual Signal Features within a Multi-facetted Conceptual Model for Automatic Image Retrieval

    No full text
    International audienceThe majority of the content-based image retrieval (CBIR) systems are restricted to the representation of signal aspects, e.g. color, texture... without explicitly considering the semantic content of images. According to these approaches a sun, for example, is represented by an orange or yellow circle, but not by the term "sun". The signal-oriented solutions are fully automatic, and thus easily usable on substantial amounts of data, but they do not fill the existing gap between the extracted low-level features and semantic descriptions. This obviously penalizes qualitative and quantitative performances in terms of recall and precision, and therefore users' satisfaction. Another class of methods, which were tested within the framework of the Fermi-GC project, consisted in modeling the content of images following a sharp process of human-assisted indexing. This approach, based on an elaborate model of representation (the conceptual graph formalism) provides satisfactory results during the retrieval phase but is not easily usable on large collections of images because of the necessary human intervention required for indexing. The contribution of this paper is twofold: in order to achieve more efficiency as far as user interaction is concerned, we propose to highlight a bond between these two classes of image retrieval systems and integrate signal and semantic features within a unified conceptual framework. Then, as opposed to state-of-the-art relevance feedback systems dealing with this integration, we propose a representation formalism supporting this integration which allows us to specify a rich query language combining both semantic and signal characterizations. We will validate our approach through quantitative (recall-precision curves) evaluations

    Annotation de vidéos par paires rares de concepts

    No full text
    National audienceLa détection d'un concept visuel dans les vidéos est une tâche difficile, spécialement pour les concepts rares ou pour ceux dont il est compliqué de décrire visuellement. Cette question devient encore plus difficile quand on veut détecter une paire de concepts au lieu d'un seul. En effet, plus le nombre de concepts présents dans une scène vidéo est grand, plus cette dernière est complexe visuellement, et donc la difficulté de lui trouver une description spécifique s'accroit encore plus. Deux directions principales peuvent eˆtre suivies pour tacler ce problème: 1) détecter chaque concept séparément et combiner ensuite les prédictions de leurs détecteurs correspondants d'une manière similaire à celle utilisée souvent en recherche d'information, ou 2) considérer le couple comme un nouveau concept et générer un classifieur supervisé pour ce nouveau concept en inférant de nouvelles annotations à partir de celles des deux concepts formant la paire. Chacune de ces approches a ses avantages et ses inconvénients. Le problème majeur de la deuxième méthode est la nécessité d'un ensemble de données annotées, surtout pour la classe positive. S'il y a des concepts rares, cette rareté s'accroit encore plus pour les paires formées de leurs combinaisons. D'une autre part, il peut y avoir deux concepts assez fréquents mais il est très rare qu'ils occurrent conjointement dans un meˆme document. Certains travaux de l'état de l'art ont proposé de palier ce problème en récoltant des exemples représentatifs des classes étudiées du web, mais cette tâche reste couˆteuse en temps et argent. Nous avons comparé les deux types d'approches sans recourir à des ressources externes. Notre évaluation a été réalisée dans le cadre de la sous-tâche "détection de paire de concepts" de la tâche d'indexation sémantique (SIN) de TRECVID 2013, et les résultats ont révélé que pour le cas des vidéos, si on n'utilise pas de ressources d'information externes, les approches qui fusionnent les résultats des deux détecteurs sont plus performantes, contrairement à ce qui a été montré dans des travaux antérieurs pour le cas des images fixes. La performance des méthodes décrites dépasse celle du meilleur résultat officiel de la campagne d'évaluation précédemment citée, de 9% en termes de gain relatif sur la précision moyenne (MAP)

    Dynamic Learning of Indexing Concepts for Home Image Retrieval

    No full text
    International audienceThis paper presents a component of a content based image retrieval system dedicated to let a user define the indexing terms used later during retrieval. A user inputs a indexing term name, image examples and counter-examples of the term,and the system learns a model of the concept as well as a similarity measure for this term. The similarity measure is based on weights reflecting the importance of each low-level feature extracted from the images. The system computes these weights using a genetic algorithm. Rating a particular similarity measure is done by clustering the examples and counter-examples using these weights and computing the quality of the obtained clusters. Experiments are conducted and results are presented on a set of 600 images
    corecore